Finding Contrast Patterns for Mixed Streaming Data
نویسندگان
چکیده
Contrast set mining identifies patterns in the data that can best distinguish between groups. Most of the existing work focuses on categorical and batch data, and they do not scale well for large datasets. In this work, we focus on finding contrast patterns for mixed (quantitative and categorical) and streaming data. We adapt a discretization methodology, Supervised Dynamic and Adaptive Discretization, to identify meaningful bin boundaries. We then use the discretization result to find contrast patterns on streaming data. In order to achieve this, we identify frequent items and then contrast them to a group of interest. To handle potential concept drift, we propose an update strategy to keep the frequent items relevant. In addition, our algorithm samples feature combinations based on their "sampling" score and user feedback to reduce the search space as well as retrieving more interesting patterns.
منابع مشابه
Modelling and Scheduling Lot Streaming Flexible Flow Lines
Although lot streaming scheduling is an active research field, lot streaming flexible flow lines problems have received far less attention than classical flow shops. This paper deals with scheduling jobs in lot streaming flexible flow line problems. The paper mathematically formulates the problem by a mixed integer linear programming model. This model solves small instances to optimality. Moreo...
متن کاملDiscovering Patterns in Multiple Time-series
In the past there has been some methodologies for solving time-series data mining. Those previous works of multiple sequences matching mechanisms are complicated and lack of comprehensive application domains, especially in multiple streaming data. Here we deal with these restrictions by introducing a novel methodology for finding multiple time-series patterns. The model is evaluated the noise b...
متن کاملFuzzy Data Envelopment Analysis for Classification of Streaming Data
The classification of fuzzy uncertain data is considered one of the most challenging issues in data analysis. In spite of the significance of fuzzy data in mathematical programming, the development of the analytical methods of fuzzy data is slow. Therefore, the current study proposes a new fuzzy data classification method based on fuzzy data envelopment analysis (DEA) which can handle strea...
متن کاملHybrid algorithms for Job shop Scheduling Problem with Lot streaming and A Parallel Assembly Stage
In this paper, a Job shop scheduling problem with a parallel assembly stage and Lot Streaming (LS) is considered for the first time in both machining and assembly stages. Lot Streaming technique is a process of splitting jobs into smaller sub-jobs such that successive operations can be overlapped. Hence, to solve job shop scheduling problem with a parallel assembly stage and lot streaming, deci...
متن کاملFinding Patterns in Large Star Schemas at the Right Aggregation Level
There are many stand-alone algorithms to mine different types of patterns in traditional databases. However, to effectively and efficiently mine databases with more complex and large data tables is still a growing challenge in data mining. The nature of data streams makes streaming techniques a promising way to handle large amounts of data, since their main ideas are to avoid multiple scans and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018